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ABSTRACT 

Summary: PONDEROSA (Peak-picking Of Noe Data Enabled 
by Restriction of Shift Assignments) accepts input information 
consisting of a protein sequence, backbone and sidechain 
NMR resonance assignments, and 3D-NOESY ( 13 C-edited and/or 
15 N-edited) spectra, and returns assignments of NOESY crosspeaks, 
distance and angle constraints, and a reliable NMR structure 
represented by a family of conformers. PONDEROSA incorporates 
and integrates external software packages (TALOS+, STRIDE and 
CYANA) to carry out different steps in the structure determination. 
PONDEROSA implements internal functions that identify and validate 
NOESY peak assignments and assess the quality of the calculated 
three-dimensional structure of the protein. The robustness of 
the analysis results from PONDEROSAs hierarchical processing 
steps that involve iterative interaction among the internal and 
external modules. PONDEROSA supports a variety of input formats: 
SPARKY assignment table (.shifts) and spectrum file formats (.ucsf), 
XEASY proton file format (.prof), and N MR-STAR format (.star). 
To demonstrate the utility of PONDEROSA, we used the package 
to determine 3D structures of two proteins: human ubiquitin and 
Escherichia coli iron-sulfur scaffold protein variant lscU(D39A). The 
automatically generated structural constraints and ensembles of 
conformers were as good as or better than those determined 
previously by much less automated means. 

Availability: The program, in the form of binary code 
along with tutorials and reference manuals, is available at 
http://ponderosa.nmrfam.wisc.edu/. 

Contact: whlee@nmrfam.wisc.edu; markley@nmrfam.wisc.edu 
Supplementary information: Supplementary data are available at 
Bioinformatics online. 
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1 INTRODUCTION 

A major challenge of structural biology is to close the gap 
between known sequences of proteins [> 1 x 10 8 in GenBank 
(Benson et al, 2008)] and their 3D structures (~ 1 x 10 5 in 
PDB; Berman et al., 2000). Automation now plays a key role 
in speeding up the determination of protein structures by X-ray 
crystallography. However, the determination of protein structures 
by NMR spectroscopy includes a larger number of steps that 
present greater challenges for automation. The steps basically are 
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sequential; however, some of them may need to be iterated in order 
to yield a satisfactory protein structure. Software packages have 
been developed to automate individual steps, and in some cases 
to pipeline several steps (Bahrami et al., 2009; Lopez-Mendez and 
Giintert, 2006). One of the challenges has been to automate the final 
steps beyond backbone and sidechain peak assignment, including 
the determination of torsion angle constraints, the assignment of 
NOESY cross peaks and the determination of distance constraints, 
the analysis of secondary structure, and the calculation of a validated 
3D protein structure. The PONDEROSA (Peak-picking Of Noe Data 
Enabled by Restriction of Shift Assignments) software package 
described here bridges this gap and is meant to be used with 
an automated resonance assignment package such as PINE-NMR 
introduced earlier by our group (Bahrami et al., 2009). 



2 IMPLEMENTATION 

PONDEROSA (Supplementary Fig. SI A) accepts resonance 
assignments in popular file formats (SPARKY; T.D. Goddard 
and D. G. Kneller, SPARKY 3; University of California, 
San Francisco, XEASY; Bartels et al, 1995 or NMR-STAR; 
http://www.bmrb.wisc.edu/dictionary/), an amino acid sequence 
file in either one- or three-letter code, and 13 C-NOESY and/or 
15 N-NOESY datasets in SPARKY (.ucsf) format. By integrating 
internal functions and external programs, PONDEROSA provides 
as output NOE peak lists, NOE assignments, structural constraints 
and a family of conformers representing the 3D structure. 

Internal functions: The major internal functions of PONDEROSA 
simulate and validate NOESY peaks and manage interactions among 
the internal and external software routines. PONDEROSA uses 
available resonance assignments to simulate all possible short, 
medium- and long-range peaks (Supplementary Figs S2 and S3). 
Members of the set of simulated peaks are validated by comparing 
them to peaks detected in the experimental NOESY datasets under 
different threshold levels. The sets of validated peak lists are 
provided to the external programs that determine torsion angle 
restraints, assign NOESY peaks, calculate structures and analyze 
secondary structure. The results from these programs are recycled 
to PONDEROSA for the next iteration (Supplementary Fig. S4). 

PONDEROSA examines the effect of the threshold level on a 
structural quality score that incorporates the root mean standard 
deviation (RMSD) of backbone atoms in structured regions as 
determined by STRIDE, the number of constraint and van der 
Waals violations, and number of residues in favored and disallowed 
Ramachandran regions. If both 13 C- and 15 N-edited NOESY 
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data are present, PONDEROSA interactively determines optimal 
thresholds for each. 

External programs: PONDEROSA interacts with TALOS+ (Shen 
et al. , 2009) for identifying structured regions and for determining 
torsion angle restraints from assigned chemical shifts, STRIDE 
(Frishman et al, 1995) for analyzing secondary structure, and 
CYANA (Giintert, 2004) for assigning NOESY cross peaks and 
calculating 3D structures. 

Graphical User Interface: An intuitive graphical user interface 
(Supplementary Fig. SIB) enables specification of the number of 
CPU nodes, steps and cycles to be used in CYANA iterations, the 
limit on the number of NOESY peaks to be searched for on the basis 
of local peak maxima, and the weighting factors for RMSD distance 
violations and torsion angle dispersions. 

3 RESULTS AND CONCLUSION 

We selected two proteins to illustrate the use of PONDEROSA 
for NOESY peak picking and automated structure determination: 
human ubiquitin (76 residues) and Escherichia coli iron-sulfur 
scaffold protein variant IscU(D39A) (128 residues). We chose 
human ubiquitin because it is a well-known test sample for protein 
NMR technology development with 3D structures deposited in 
the Protein Data Bank (PDB), e.g. 1D3Z (Cornilescu et al, 
1998). We chose IscU(D39A) (Kim et al, 2009) because it 
is a larger protein with a recently deposited non-automatically 
derived NMR structure (PDB 2KQK) that exhibited variation in 
the position of secondary structural elements within the family of 
20 conformers. In determining the structures of both proteins, we 
used X H- 15 N HSQC, ^-"C HSQC, CBCA(CO)NH, HNCACB 
and HBHA(CO)NH datasets for backbone assignments, and 
(H)CC(CO)NH, H(CC)(CO)NH and HCCH-TOCSY datasets for 
sidechain assignments. We used NMRpipe (Delaglio era/., 1995) to 
process all spectra and then converted the spectra to SPARKY (.ucsf) 
files. We used PINE-NMR and PINE-SPARKY (Lee et al, 2009) 
to assign the spectra of human ubiquitin, but assigned IscU(D39A) 
by a manual assignment strategy. We processed 3D 13 C-NOESY 
and 15 N-NOESY datasets with NMRPipe and converted the spectra 
to .ucsf files for input to PONDEROSA. The total times required 
for the structure determinations with 24 CPUs were 9 h for human 
ubiquitin and 15 h for IscU(D39A). 

The 20 best conformers of human ubiquitin determined by 
PONDEROSA (Supplementary Fig. SIC) had a RMSD of 0.09 A 
for backbone atoms and 0.48 A for all heavy atoms in structured 



regions. The 20 best conformers of IscU(D39A) determined by 
PONDEROSA had an RMSD of 0.20 A for backbone atoms 
and 0.61 A for all heavy atoms in structured regions. The 
structures determined by PONDEROSA were very similar to those 
determined earlier by more manual approaches: 1.15 A RMSD for 
the backbone atoms of human ubiquitin (PONDEROSA versus 
1D3Z) and 1.30A for structured backbone atoms IscU(D39A) 
(PONDEROSA versus 2KQK) (Supplementary Fig. SID). Analysis 
by two standard validation suites, PSVS (Bhattacharya et al, 2007) 
and iCing (http://nmr.cmbi.ru.n1/icing/#welcome), revealed that the 
PONDEROSA-derived structures were of equivalent quality to the 
structures of the same proteins in the Protein Data Bank (1D3Z and 
2KQK) determined by less automated means. 
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